Coronavirus disease (COVID-19) is an PANDEMIC infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2).The coronavirus COVID-19 hass affected 212 countries and territories around the world. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019. COVID-19 is a pandemic affecting many countries globally.The time between exposure to COVID-19 and the moment when symptoms start is commonly around five to six days but can range from 1 – 14 days. I have gathered Covid-19 data from different sources. Dataset contains confirmed case, deaths and recovered cases, new active cases in world and in India and indian State/UnionTerritory separately.
1. On the basis of all countries: data scraped using beautiful soup https://www.worldometers.info/coronavirus/
2. Covid analysis of India:'covid_19_india.csv'
3. Analysis based on Age:'AgeGroupDetails.csv'
4. Analysis based on ICMR Testing labs:'ICMRTestingDetails.csv'
5. Analysis based on testing done in indian states:'StatewiseTestingDetails.csv'
Source : https://www.kaggle.com/sudalairajkumar/covid19-in-india
6. Analysis based on Time Series Confirmed cases all over world from :
7. Analysis based on Time Series Recovered case all over world:'time_series_covid19_recovered.csv'.
8. Analysis based on Time Series Deaths case all over world:'time_series_covid19_deaths.csv'
Source : https://data.humdata.org/dataset/5dff64bc-a671-48da-aa87-2ca40d7abf02
* Dataset coloums:
- Total_cases
- Total_deaths
- Total_Recovered
- New_cases
- New Deaths
- Active_Cases
- Serious_Case
- TotCases/1Mpop
- Deaths/1Mpop
- Total Tests
- Test/1Mpop
full_data.head(10)
| Country,Other | TotalCases | NewCases | TotalDeaths | NewDeaths | TotalRecovered | ActiveCases | Serious,Critical | Tot Cases/1M pop | Deaths/1M pop | TotalTests | Tests/ 1M pop | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 4080142 | +70,851 | 279280.0 | +3,304 | 1425122.0 | 2375740 | 47667.0 | 523.0 | 35.8 | NaN | NaN |
| 1 | USA | 1341281 | +19,496 | 79823.0 | +1,208 | 232360.0 | 1029098 | 16796.0 | 4052.0 | 241.0 | 8571364.0 | 25895.0 |
| 2 | Spain | 262783 | +2,666 | 26478.0 | +179 | 173157.0 | 63148 | 1741.0 | 5620.0 | 566.0 | 2467761.0 | 52781.0 |
| 3 | Italy | 218268 | +1,083 | 30395.0 | +194 | 103031.0 | 84842 | 1034.0 | 3610.0 | 503.0 | 2514234.0 | 41584.0 |
| 4 | UK | 215260 | +3,896 | 31587.0 | +346 | NaN | 183329 | 1559.0 | 3171.0 | 465.0 | 1728443.0 | 25461.0 |
| 5 | Russia | 198676 | +10,817 | 1827.0 | +104 | 31916.0 | 164933 | 2300.0 | 1361.0 | 13.0 | 5221964.0 | 35783.0 |
| 6 | France | 176658 | +579 | 26310.0 | +80 | 56038.0 | 94310 | 2812.0 | 2706.0 | 403.0 | 1384633.0 | 21213.0 |
| 7 | Germany | 171264 | +676 | 7543.0 | +33 | 143300.0 | 20421 | 1650.0 | 2044.0 | 90.0 | 2755770.0 | 32891.0 |
| 8 | Brazil | 148670 | +2,778 | 10100.0 | +108 | 59297.0 | 79273 | 8318.0 | 699.0 | 48.0 | 339552.0 | 1597.0 |
| 9 | Turkey | 137115 | +1,546 | 3739.0 | +50 | 89480.0 | 43896 | 1168.0 | 1626.0 | 44.0 | 1334411.0 | 15822.0 |
covid_india.head(5)
| Sno | Date | Time | State/UnionTerritory | ConfirmedIndianNational | ConfirmedForeignNational | Cured | Deaths | Confirmed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 30/01/20 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 1 | 2 | 31/01/20 | 6:00 PM | Kerala | 1 | 0 | 0 | 0 | 1 |
| 2 | 3 | 01/02/20 | 6:00 PM | Kerala | 2 | 0 | 0 | 0 | 2 |
| 3 | 4 | 02/02/20 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
| 4 | 5 | 03/02/20 | 6:00 PM | Kerala | 3 | 0 | 0 | 0 | 3 |
30/jan/2020 india reported first case in kerala
02/march/2020 india reported their covid_19 cases i Telengana ,Delhi and after that covid_19 cases reported to in other state and /UnionTerritory.
13/03/2020 Karnataka reported death of one patients due to covid_19
covid_india_age = pd.read_csv('AgeGroupDetails.csv')
covid_india_age
| Sno | AgeGroup | TotalCases | Percentage | |
|---|---|---|---|---|
| 0 | 1 | 0-9 | 22 | 3.18% |
| 1 | 2 | 10-19 | 27 | 3.90% |
| 2 | 3 | 20-29 | 172 | 24.86% |
| 3 | 4 | 30-39 | 146 | 21.10% |
| 4 | 5 | 40-49 | 112 | 16.18% |
| 5 | 6 | 50-59 | 77 | 11.13% |
| 6 | 7 | 60-69 | 89 | 12.86% |
| 7 | 8 | 70-79 | 28 | 4.05% |
| 8 | 9 | >=80 | 10 | 1.45% |
| 9 | 10 | Missing | 9 | 1.30% |
covid_india_testing = pd.read_csv('ICMRTestingDetails.csv')
covid_india_testing.tail(5)
| SNo | DateTime | TotalSamplesTested | TotalIndividualsTested | TotalPositiveCases | |
|---|---|---|---|---|---|
| 37 | 38 | 23/04/20 9:00 | 541789.0 | 525667.0 | 23502.0 |
| 38 | 39 | 24/04/20 9:00 | 579957.0 | NaN | NaN |
| 39 | 40 | 25/04/20 9:00 | 625309.0 | NaN | NaN |
| 40 | 41 | 26/04/20 9:00 | 665819.0 | NaN | NaN |
| 41 | 42 | 27/04/20 9:00 | 716733.0 | NaN | NaN |
covid_india_state_testing = pd.read_csv('StatewiseTestingDetails.csv')
covid_india_state_testing
| Date | State | TotalSamples | Negative | Positive | |
|---|---|---|---|---|---|
| 0 | 2020-04-17 | Andaman and Nicobar Islands | 1403.0 | 1210.0 | 12.0 |
| 1 | 2020-04-24 | Andaman and Nicobar Islands | 2679.0 | NaN | 27.0 |
| 2 | 2020-04-27 | Andaman and Nicobar Islands | 2848.0 | NaN | 33.0 |
| 3 | 2020-05-01 | Andaman and Nicobar Islands | 3754.0 | NaN | 33.0 |
| 4 | 2020-04-02 | Andhra Pradesh | 1800.0 | 1175.0 | 132.0 |
| ... | ... | ... | ... | ... | ... |
| 754 | 2020-04-30 | West Bengal | 16525.0 | NaN | 758.0 |
| 755 | 2020-05-01 | West Bengal | 18566.0 | NaN | NaN |
| 756 | 2020-05-02 | West Bengal | 20976.0 | NaN | 795.0 |
| 757 | 2020-05-03 | West Bengal | 22915.0 | NaN | 922.0 |
| 758 | 2020-05-04 | West Bengal | 25116.0 | NaN | 1259.0 |
759 rows × 5 columns
-TimeSeries data contain DATE and Country information.
covid_time_series_C= pd.read_csv('time_series_covid19_confirmed.csv')
covid_time_series_C
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | Afghanistan | 33.000000 | 65.000000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1703 | 1828 | 1939 | 2171 | 2335 | 2469 | 2704 | 2894 | 3224 | 3392 |
| 1 | NaN | Albania | 41.153300 | 20.168300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 736 | 750 | 766 | 773 | 782 | 789 | 795 | 803 | 820 | 832 |
| 2 | NaN | Algeria | 28.033900 | 1.659600 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 3517 | 3649 | 3848 | 4006 | 4154 | 4295 | 4474 | 4648 | 4838 | 4997 |
| 3 | NaN | Andorra | 42.506300 | 1.521800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 743 | 743 | 743 | 745 | 745 | 747 | 748 | 750 | 751 | 751 |
| 4 | NaN | Angola | -11.202700 | 17.873900 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 27 | 27 | 27 | 27 | 30 | 35 | 35 | 35 | 36 | 36 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 261 | NaN | Western Sahara | 24.215500 | -12.885800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| 262 | NaN | Sao Tome and Principe | 0.186360 | 6.613081 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 4 | 8 | 8 | 14 | 16 | 16 | 16 | 23 | 174 | 174 |
| 263 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 6 | 6 | 7 | 10 | 10 | 12 | 22 | 25 |
| 264 | NaN | Comoros | -11.645500 | 43.333300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 1 | 3 | 3 | 3 | 3 | 8 |
| 265 | NaN | Tajikistan | 38.861034 | 71.276093 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 15 | 15 | 76 | 128 | 230 | 293 | 379 |
266 rows × 110 columns
covid_time_series_covid_19_R = pd.read_csv('time_series_covid19_recovered.csv')
covid_time_series_covid_19_R
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | Afghanistan | 33.000000 | 65.000000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 220 | 228 | 252 | 260 | 310 | 331 | 345 | 397 | 421 | 458 |
| 1 | NaN | Albania | 41.153300 | 20.168300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 422 | 431 | 455 | 470 | 488 | 519 | 531 | 543 | 570 | 595 |
| 2 | NaN | Algeria | 28.033900 | 1.659600 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1558 | 1651 | 1702 | 1779 | 1821 | 1872 | 1936 | 1998 | 2067 | 2197 |
| 3 | NaN | Andorra | 42.506300 | 1.521800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 385 | 398 | 423 | 468 | 468 | 472 | 493 | 499 | 514 | 521 |
| 4 | NaN | Angola | -11.202700 | 17.873900 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 6 | 6 | 7 | 7 | 11 | 11 | 11 | 11 | 11 | 11 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 247 | NaN | Western Sahara | 24.215500 | -12.885800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| 248 | NaN | Sao Tome and Principe | 0.186360 | 6.613081 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 249 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 250 | NaN | Comoros | -11.645500 | 43.333300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 251 | NaN | Tajikistan | 38.861034 | 71.276093 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
252 rows × 110 columns
covid_time_series_D = pd.read_csv('time_series_covid19_deaths.csv')
covid_time_series_D
| Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | Afghanistan | 33.000000 | 65.000000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 57 | 58 | 60 | 64 | 68 | 72 | 85 | 90 | 95 | 104 |
| 1 | NaN | Albania | 41.153300 | 20.168300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 28 | 30 | 30 | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
| 2 | NaN | Algeria | 28.033900 | 1.659600 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 432 | 437 | 444 | 450 | 453 | 459 | 463 | 465 | 470 | 476 |
| 3 | NaN | Andorra | 42.506300 | 1.521800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 40 | 41 | 42 | 42 | 43 | 44 | 45 | 45 | 46 | 46 |
| 4 | NaN | Angola | -11.202700 | 17.873900 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 261 | NaN | Western Sahara | 24.215500 | -12.885800 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 262 | NaN | Sao Tome and Principe | 0.186360 | 6.613081 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 3 | 3 | 3 |
| 263 | NaN | Yemen | 15.552727 | 48.516388 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 4 | 5 |
| 264 | NaN | Comoros | -11.645500 | 43.333300 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 265 | NaN | Tajikistan | 38.861034 | 71.276093 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 3 | 5 | 8 |
266 rows × 110 columns
**Assessing Full_data of covid_19 all countries Data
full_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 216 entries, 0 to 215 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country,Other 216 non-null object 1 TotalCases 216 non-null int64 2 NewCases 122 non-null object 3 TotalDeaths 179 non-null float64 4 NewDeaths 83 non-null object 5 TotalRecovered 209 non-null float64 6 ActiveCases 216 non-null int64 7 Serious,Critical 135 non-null float64 8 Tot Cases/1M pop 214 non-null float64 9 Deaths/1M pop 177 non-null float64 10 TotalTests 183 non-null float64 11 Tests/ 1M pop 183 non-null float64 dtypes: float64(7), int64(2), object(3) memory usage: 20.4+ KB
At the time of writing this report.
Total 216 countries, 177 countries reported deaths.
TotalRecovered cases reported in 207 countries.
82 NewDeaths are reported.
Testing reported in 177 countries.
** Some data is missing. It may be True or confirmed cases, deaths,Recovered cases are not reported. We can not analyse such cases.
***Rename Columns name in full_data dataframe .
full_data.fillna(0, inplace=True)
full_data['new_cases'] = full_data['new_cases'].str.replace(',', '', regex=True)
full_data['new_deaths'] = full_data['new_deaths'].str.replace(',','', regex=True)
full_data['new_cases'].fillna(0,inplace=True)
full_data['new_deaths'].fillna(0,inplace=True)
full_data['new_cases'] = full_data['new_cases'].astype('int64')
full_data['new_deaths'] = full_data['new_deaths'].astype('int64')
full_data
| country | total_cases | new_cases | total_deaths | new_deaths | total_recovered | active_cases | serious | tot cases/1m_pop | deaths/1m_pop | total_tests | tests/_1m_pop | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | World | 4080142 | 70851 | 279280 | 3304 | 1425122 | 2375740 | 47667 | 523.0 | 35 | 0 | 0 |
| 1 | USA | 1341281 | 19496 | 79823 | 1208 | 232360 | 1029098 | 16796 | 4052.0 | 241 | 8571364 | 25895 |
| 2 | Spain | 262783 | 2666 | 26478 | 179 | 173157 | 63148 | 1741 | 5620.0 | 566 | 2467761 | 52781 |
| 3 | Italy | 218268 | 1083 | 30395 | 194 | 103031 | 84842 | 1034 | 3610.0 | 503 | 2514234 | 41584 |
| 4 | UK | 215260 | 3896 | 31587 | 346 | 0 | 183329 | 1559 | 3171.0 | 465 | 1728443 | 25461 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 211 | Western Sahara | 6 | 0 | 0 | 0 | 5 | 1 | 0 | 10.0 | 0 | 0 | 0 |
| 212 | Anguilla | 3 | 0 | 0 | 0 | 3 | 0 | 0 | 200.0 | 0 | 0 | 0 |
| 213 | Saint Pierre Miquelon | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 173.0 | 0 | 0 | 0 |
| 214 | China | 82887 | 1 | 4633 | 0 | 78046 | 208 | 15 | 58.0 | 3 | 0 | 0 |
| 215 | Total: | 4080142 | 70851 | 279280 | 3304 | 1425122 | 2375740 | 47667 | 523.4 | 35 | 0 | 0 |
216 rows × 12 columns
full_data.describe()
| total_cases | new_cases | total_deaths | new_deaths | total_recovered | active_cases | serious | tot cases/1m_pop | deaths/1m_pop | total_tests | tests/_1m_pop | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 2.140000e+02 | 214.000000 | 214.000000 | 214.000000 | 214.000000 | 2.140000e+02 | 214.000000 | 214.000000 | 214.000000 | 2.140000e+02 | 214.000000 |
| mean | 1.906608e+04 | 331.079439 | 1305.046729 | 15.439252 | 6656.672897 | 1.110159e+04 | 222.742991 | 982.191121 | 45.032710 | 2.148384e+05 | 15193.149533 |
| std | 9.865362e+04 | 1594.622212 | 6759.503801 | 89.700930 | 25562.683331 | 7.297667e+04 | 1325.467112 | 2107.016875 | 132.545621 | 7.737569e+05 | 26422.465363 |
| min | 1.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 |
| 25% | 9.650000e+01 | 0.000000 | 2.000000 | 0.000000 | 30.250000 | 2.600000e+01 | 0.000000 | 43.500000 | 0.000000 | 8.655000e+02 | 488.250000 |
| 50% | 7.300000e+02 | 4.000000 | 14.000000 | 0.000000 | 257.500000 | 2.775000e+02 | 2.000000 | 200.000000 | 3.000000 | 1.526900e+04 | 3655.000000 |
| 75% | 5.597750e+03 | 85.750000 | 116.500000 | 3.000000 | 1845.000000 | 2.275500e+03 | 22.750000 | 1070.000000 | 24.750000 | 1.321298e+05 | 18152.750000 |
| max | 1.341281e+06 | 19496.000000 | 79823.000000 | 1208.000000 | 232360.000000 | 1.029098e+06 | 16796.000000 | 18773.000000 | 1208.000000 | 8.571364e+06 | 171971.000000 |
full_data.sum()
country USASpainItalyUKRussiaFranceGermanyBrazilTurkey... total_cases 4080142 new_cases 70851 total_deaths 279280 new_deaths 3304 total_recovered 1424528 active_cases 2375740 serious 47667 tot cases/1m_pop 210189 deaths/1m_pop 9637 total_tests 45975420 tests/_1m_pop 3251334 dtype: object
countries = pd.read_csv('countries_data.csv', encoding= 'unicode_escape')
countries
| country | latitude | longitude | name | |
|---|---|---|---|---|
| 0 | AD | 42.546245 | 1.601554 | Andorra |
| 1 | AE | 23.424076 | 53.847818 | United Arab Emirates |
| 2 | AF | 33.939110 | 67.709953 | Afghanistan |
| 3 | AG | 17.060816 | -61.796428 | Antigua and Barbuda |
| 4 | AI | 18.220554 | -63.068615 | Anguilla |
| ... | ... | ... | ... | ... |
| 270 | MS Zaandam | 52.442039 | 4.829199 | MS Zaandam |
| 271 | Caribbean Netherlands | 12.178400 | 68.2385 | Caribbean Netherlands |
| 272 | St. Barth | 17.900000 | 62.8333 | St. Barth |
| 273 | Saint Pierre Miquelon | 46.885200 | 56.3159 | Saint Pierre Miquelon |
| 274 | CAR | 6.611100 | 20.9394 | CAR |
275 rows × 4 columns
covid_india.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1671 entries, 0 to 1670 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Sno 1671 non-null int64 1 Date 1671 non-null object 2 Time 1671 non-null object 3 State/UnionTerritory 1671 non-null object 4 ConfirmedIndianNational 1671 non-null object 5 ConfirmedForeignNational 1671 non-null object 6 Cured 1671 non-null int64 7 Deaths 1671 non-null int64 8 Confirmed 1671 non-null int64 dtypes: int64(4), object(5) memory usage: 117.6+ KB
covid_india.describe()
| Sno | Cured | Deaths | Confirmed | |
|---|---|---|---|---|
| count | 1671.000000 | 1671.000000 | 1671.000000 | 1671.000000 |
| mean | 836.000000 | 86.929982 | 13.054458 | 408.981448 |
| std | 482.520466 | 251.026352 | 49.430082 | 1188.132537 |
| min | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 418.500000 | 0.000000 | 0.000000 | 5.000000 |
| 50% | 836.000000 | 5.000000 | 0.000000 | 32.000000 |
| 75% | 1253.500000 | 34.000000 | 4.000000 | 258.500000 |
| max | 1671.000000 | 2819.000000 | 617.000000 | 15525.000000 |
covid_india_state_testing.describe()
| TotalSamples | Negative | Positive | |
|---|---|---|---|
| count | 759.000000 | 623.000000 | 750.000000 |
| mean | 20797.184453 | 21013.754414 | 762.126667 |
| std | 29959.945068 | 30047.832110 | 1542.251749 |
| min | 58.000000 | 0.000000 | 0.000000 |
| 25% | 2392.500000 | 2492.500000 | 33.000000 |
| 50% | 8612.000000 | 8310.000000 | 207.000000 |
| 75% | 24910.000000 | 24462.500000 | 737.500000 |
| max | 175323.000000 | 162349.000000 | 14541.000000 |
covid_time_series_C.describe()
| Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | ... | 266.000000 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 | 2.660000e+02 |
| mean | 21.259359 | 22.432499 | 2.086466 | 2.458647 | 3.537594 | 5.390977 | 7.962406 | 11.003759 | 20.969925 | 23.180451 | ... | 11367.375940 | 1.164357e+04 | 1.192589e+04 | 1.224381e+04 | 1.257059e+04 | 1.288475e+04 | 1.318319e+04 | 1.347013e+04 | 1.376952e+04 | 1.411782e+04 |
| std | 24.747943 | 70.478908 | 27.279200 | 27.377862 | 34.083035 | 47.434934 | 66.289178 | 89.313757 | 219.187744 | 220.524977 | ... | 65963.451143 | 6.750722e+04 | 6.918850e+04 | 7.102979e+04 | 7.311436e+04 | 7.495796e+04 | 7.657327e+04 | 7.801568e+04 | 7.957189e+04 | 8.121537e+04 |
| min | -51.796300 | -135.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 6.907750 | -18.093125 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 74.250000 | 7.500000e+01 | 7.600000e+01 | 7.725000e+01 | 8.100000e+01 | 8.200000e+01 | 8.225000e+01 | 8.675000e+01 | 9.350000e+01 | 9.525000e+01 |
| 50% | 23.488100 | 20.921188 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 423.500000 | 4.335000e+02 | 4.555000e+02 | 4.665000e+02 | 4.855000e+02 | 4.995000e+02 | 5.085000e+02 | 5.425000e+02 | 5.480000e+02 | 5.560000e+02 |
| 75% | 41.143200 | 77.191525 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 1974.500000 | 2.020000e+03 | 2.047250e+03 | 2.105250e+03 | 2.165750e+03 | 2.359250e+03 | 2.515000e+03 | 2.635750e+03 | 2.699750e+03 | 2.872750e+03 |
| max | 71.706900 | 178.065000 | 444.000000 | 444.000000 | 549.000000 | 761.000000 | 1058.000000 | 1423.000000 | 3554.000000 | 3554.000000 | ... | 988197.000000 | 1.012582e+06 | 1.039909e+06 | 1.069424e+06 | 1.103461e+06 | 1.132539e+06 | 1.158040e+06 | 1.180375e+06 | 1.204351e+06 | 1.228603e+06 |
8 rows × 108 columns
covid_time_series_covid_19_R.describe()
| Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | ... | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 | 252.000000 |
| mean | 19.997457 | 28.167963 | 0.111111 | 0.119048 | 0.142857 | 0.154762 | 0.206349 | 0.242063 | 0.424603 | 0.500000 | ... | 3466.972222 | 3598.980159 | 3763.591270 | 4023.297619 | 4176.250000 | 4337.746032 | 4465.222222 | 4613.984127 | 4757.269841 | 4942.115079 |
| std | 24.408240 | 67.225277 | 1.763834 | 1.767827 | 1.958592 | 2.024712 | 2.654973 | 2.858017 | 5.069858 | 5.577059 | ... | 14409.367219 | 14829.617179 | 15399.747262 | 16859.803967 | 17468.700495 | 18210.920341 | 18611.712422 | 19134.155254 | 19528.619764 | 20013.875983 |
| min | -51.796300 | -106.346800 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 6.565350 | -7.825200 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 18.000000 | 19.000000 | 19.750000 | 24.750000 | 25.750000 | 25.750000 | 27.000000 | 28.500000 | 30.000000 | 30.000000 |
| 50% | 21.805100 | 23.409400 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 170.000000 | 182.000000 | 182.000000 | 198.000000 | 209.500000 | 213.000000 | 226.000000 | 232.000000 | 242.000000 | 256.500000 |
| 75% | 39.329025 | 85.953175 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 927.500000 | 977.500000 | 1007.500000 | 1022.000000 | 1083.250000 | 1156.500000 | 1224.000000 | 1275.750000 | 1317.250000 | 1342.500000 |
| max | 71.706900 | 178.065000 | 28.000000 | 28.000000 | 31.000000 | 32.000000 | 42.000000 | 45.000000 | 80.000000 | 88.000000 | ... | 114500.000000 | 117400.000000 | 120720.000000 | 153947.000000 | 164015.000000 | 175382.000000 | 180152.000000 | 187180.000000 | 189791.000000 | 189910.000000 |
8 rows × 108 columns
covid_time_series_D.describe()
| Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | ... | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 | 266.000000 |
| mean | 21.259359 | 22.432499 | 0.063910 | 0.067669 | 0.097744 | 0.157895 | 0.210526 | 0.308271 | 0.492481 | 0.500000 | ... | 806.180451 | 830.071429 | 855.883459 | 877.281955 | 897.063910 | 916.571429 | 930.338346 | 945.627820 | 967.063910 | 991.845865 |
| std | 24.747943 | 70.478908 | 1.042337 | 1.043908 | 1.473615 | 2.453621 | 3.189730 | 4.660845 | 7.664297 | 7.664793 | ... | 4605.196763 | 4744.311185 | 4906.661988 | 5033.525489 | 5151.692272 | 5255.657646 | 5334.737320 | 5413.828455 | 5546.438359 | 5694.709721 |
| min | -51.796300 | -135.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 6.907750 | -18.093125 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 50% | 23.488100 | 20.921188 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 7.000000 | 7.000000 | 7.000000 | 8.000000 | 8.000000 | 8.000000 | 9.000000 | 9.000000 | 9.000000 | 9.000000 |
| 75% | 41.143200 | 77.191525 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 56.750000 | 58.000000 | 59.750000 | 61.000000 | 67.500000 | 70.500000 | 75.250000 | 78.000000 | 79.750000 | 85.750000 |
| max | 71.706900 | 178.065000 | 17.000000 | 17.000000 | 24.000000 | 40.000000 | 52.000000 | 76.000000 | 125.000000 | 125.000000 | ... | 56219.000000 | 58355.000000 | 60967.000000 | 62996.000000 | 64943.000000 | 66369.000000 | 67682.000000 | 68922.000000 | 71064.000000 | 73431.000000 |
8 rows × 108 columns
***Drop [Sno.] from column in Covid_india_age dataframe.
covid_india_age
| AgeGroup | TotalCases | Percentage | |
|---|---|---|---|
| 0 | 0-9 | 22 | 3.18% |
| 1 | 10-19 | 27 | 3.90% |
| 2 | 20-29 | 172 | 24.86% |
| 3 | 30-39 | 146 | 21.10% |
| 4 | 40-49 | 112 | 16.18% |
| 5 | 50-59 | 77 | 11.13% |
| 6 | 60-69 | 89 | 12.86% |
| 7 | 70-79 | 28 | 4.05% |
| 8 | >=80 | 10 | 1.45% |
| 9 | Missing | 9 | 1.30% |
*** Missing value (NaN)in 3 covid_time_series dataframe:Confirmed,Recovered,Death.
***Drop [ lat,long] from column in Covid time_series dataframe. irrelevent in this analysis
#covid_time_series_covid_19_R = covid_time_series_covid_19_R.drop(['Province/State','Lat','Long'], axis=1)
covid_time_series_covid_19_R = covid_time_series_covid_19_R.drop(['Lat','Long'], axis=1)
covid_time_series_covid_19_R
| Province/State | Country/Region | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | 1/28/20 | 1/29/20 | ... | 4/27/20 | 4/28/20 | 4/29/20 | 4/30/20 | 5/1/20 | 5/2/20 | 5/3/20 | 5/4/20 | 5/5/20 | 5/6/20 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 220 | 228 | 252 | 260 | 310 | 331 | 345 | 397 | 421 | 458 |
| 1 | 0 | Albania | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 422 | 431 | 455 | 470 | 488 | 519 | 531 | 543 | 570 | 595 |
| 2 | 0 | Algeria | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1558 | 1651 | 1702 | 1779 | 1821 | 1872 | 1936 | 1998 | 2067 | 2197 |
| 3 | 0 | Andorra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 385 | 398 | 423 | 468 | 468 | 472 | 493 | 499 | 514 | 521 |
| 4 | 0 | Angola | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 6 | 6 | 7 | 7 | 11 | 11 | 11 | 11 | 11 | 11 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 247 | 0 | Western Sahara | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| 248 | 0 | Sao Tome and Principe | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 249 | 0 | Yemen | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 250 | 0 | Comoros | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 251 | 0 | Tajikistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
252 rows × 108 columns
** Difficult to analyze time_series data.
Time series data consist of day wise date in different coloums which is not good for analysis. First we Unpivot date columns[3:] with variable column ‘Date’ and value column ‘Confirmed’, 'Recovered', 'Death'.
dates = covid_time_series_C.columns[3:]
covid_time_series_C = covid_time_series_C.melt(
id_vars=['Country/Region','Province/State'],
value_vars=dates,
var_name='Date',
value_name='Confirmed'
)
covid_time_series_C
| Country/Region | Province/State | Date | Confirmed | |
|---|---|---|---|---|
| 0 | Afghanistan | 0 | 1/23/20 | 0 |
| 1 | Albania | 0 | 1/23/20 | 0 |
| 2 | Algeria | 0 | 1/23/20 | 0 |
| 3 | Andorra | 0 | 1/23/20 | 0 |
| 4 | Angola | 0 | 1/23/20 | 0 |
| ... | ... | ... | ... | ... |
| 27925 | Western Sahara | 0 | 5/6/20 | 6 |
| 27926 | Sao Tome and Principe | 0 | 5/6/20 | 174 |
| 27927 | Yemen | 0 | 5/6/20 | 25 |
| 27928 | Comoros | 0 | 5/6/20 | 8 |
| 27929 | Tajikistan | 0 | 5/6/20 | 379 |
27930 rows × 4 columns
dates = covid_time_series_D.columns[3:]
covid_time_series_D = covid_time_series_D.melt(
id_vars=['Country/Region','Province/State'],
value_vars=dates,
var_name='Date',
value_name='Deaths'
)
covid_time_series_D = covid_time_series_D.groupby(['Country/Region', 'Date'], as_index=False)['Deaths'].sum()
covid_time_series_D
| Country/Region | Date | Deaths | |
|---|---|---|---|
| 0 | Afghanistan | 1/23/20 | 0 |
| 1 | Afghanistan | 1/24/20 | 0 |
| 2 | Afghanistan | 1/25/20 | 0 |
| 3 | Afghanistan | 1/26/20 | 0 |
| 4 | Afghanistan | 1/27/20 | 0 |
| ... | ... | ... | ... |
| 19630 | Zimbabwe | 5/2/20 | 4 |
| 19631 | Zimbabwe | 5/3/20 | 4 |
| 19632 | Zimbabwe | 5/4/20 | 4 |
| 19633 | Zimbabwe | 5/5/20 | 4 |
| 19634 | Zimbabwe | 5/6/20 | 4 |
19635 rows × 3 columns
dates = covid_time_series_covid_19_R.columns[3:]
covid_time_series_covid_19_R= covid_time_series_covid_19_R.melt(
id_vars=['Country/Region','Province/State'],
value_vars=dates,
var_name='Date',
value_name='Recovered'
)
Also world data consist on provincial data from different countrys. Country data is grouped and aggregated using group by.
Result of group by is a series object with country data grouped by date. I have converted the series object into dataframe to avoid grouping.
covid_time_series_covid_19_R = covid_time_series_covid_19_R.groupby(['Country/Region', 'Date'], as_index=False)['Recovered'].sum()
covid_time_series_covid_19_R
| Country/Region | Date | Recovered | |
|---|---|---|---|
| 0 | Afghanistan | 1/23/20 | 0 |
| 1 | Afghanistan | 1/24/20 | 0 |
| 2 | Afghanistan | 1/25/20 | 0 |
| 3 | Afghanistan | 1/26/20 | 0 |
| 4 | Afghanistan | 1/27/20 | 0 |
| ... | ... | ... | ... |
| 19630 | Zimbabwe | 5/2/20 | 5 |
| 19631 | Zimbabwe | 5/3/20 | 5 |
| 19632 | Zimbabwe | 5/4/20 | 5 |
| 19633 | Zimbabwe | 5/5/20 | 5 |
| 19634 | Zimbabwe | 5/6/20 | 5 |
19635 rows × 3 columns
*Merge
Covid_time_series_C, Covid_time_series_D, Covid_time_series_covid_19_R
using merge function.
*Calculate Death_percentage and Recovered_percentage in Covid_time_series dataframe.
As the cases started appearing in countries at different time. This way on a particualr date it isnot a good comparision between difeerent countries as we ought to analyse the outbread tread rate. I calclated the first date of corono positive case in each country and added a column in the dataframe. Then using this date I assigned the days passed after 1st corona case to each row. Now we can compare different couries based one days past first case.
#covid_time_seriesChina = covid_time_series[covid_time_series['Country/Region'] == 'China']
#covid_time_seriesChina.head(50)
covid_time_series
| Country/Region | Date | Confirmed | Deaths | Recovered | Death_percentage | Recovered_percentage | |
|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 1/23/20 | 0 | 0 | 0 | NaN | NaN |
| 1 | Afghanistan | 1/24/20 | 0 | 0 | 0 | NaN | NaN |
| 2 | Afghanistan | 1/25/20 | 0 | 0 | 0 | NaN | NaN |
| 3 | Afghanistan | 1/26/20 | 0 | 0 | 0 | NaN | NaN |
| 4 | Afghanistan | 1/27/20 | 0 | 0 | 0 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 19630 | Zimbabwe | 5/2/20 | 34 | 4 | 5 | 0.001176 | 0.001471 |
| 19631 | Zimbabwe | 5/3/20 | 34 | 4 | 5 | 0.001176 | 0.001471 |
| 19632 | Zimbabwe | 5/4/20 | 34 | 4 | 5 | 0.001176 | 0.001471 |
| 19633 | Zimbabwe | 5/5/20 | 34 | 4 | 5 | 0.001176 | 0.001471 |
| 19634 | Zimbabwe | 5/6/20 | 34 | 4 | 5 | 0.001176 | 0.001471 |
19635 rows × 7 columns
*** Replace NaN value with Zero in Death_Percentage ,Recovered_percentage in covid_time_series.
covid_time_series_I = covid_time_series[covid_time_series['Country/Region']=='India']
covid_time_series_I.head(5)
| Country/Region | Date | Confirmed | Deaths | Recovered | Death_percentage | Recovered_percentage | Day | |
|---|---|---|---|---|---|---|---|---|
| 8295 | India | 1/23/20 | 0 | 0 | 0 | 0.0 | 0.0 | 0 |
| 8296 | India | 1/24/20 | 0 | 0 | 0 | 0.0 | 0.0 | 0 |
| 8297 | India | 1/25/20 | 0 | 0 | 0 | 0.0 | 0.0 | 0 |
| 8298 | India | 1/26/20 | 0 | 0 | 0 | 0.0 | 0.0 | 0 |
| 8299 | India | 1/27/20 | 0 | 0 | 0 | 0.0 | 0.0 | 0 |
fig = px.scatter(full_data,x="total_cases",y="total_deaths",color='country',log_x=True ,log_y=True ,size_max=100, range_x=[1,1000000000],range_y=[1,1000000])
fig.update_traces(textposition='top center')
fig.update_layout(
# height=800,width=1000,
title_text='Total Deaths Cases in the world',xaxis = dict(
tickangle = 90,
title_text = "Total_cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Deaths Cases",
title_standoff = 10)
)
fig.show()
-***This plot shows country wise death cases vs total confirmend on a logarithmic scale.
As we can see in the graph top five countries on the bases of total confirmed cases, total_deaths rate, recovered cases are
fig = px.scatter(full_data,x="total_cases",y="total_recovered",color='country', log_x=True ,log_y=True ,size_max=100, range_x=[1,10000000],range_y=[1,1000000])
fig.update_traces(textposition='top center')
fig.update_layout(
# height=800,width=1000,
title_text='Total Recovered Cases in the world',
xaxis = dict(
tickangle = 90,
title_text = "Total_cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_recovered Cases",
title_standoff = 10)
)
fig.show()
* The following plot shows country wise recoverd cases on a logarithmic scale. * here all countries is consider for analysis so, there is overlapping in scatter plot.
-Statsmodels module was used for covid_19 all country data analysis that provides classes and functions for the estimation regression models, for conducting statistical tests, and statistical data exploration of covid_19 data ('total_cases' ,'total_deaths' ,'new_cases' ,'new_deaths' ,'total_recovered' ,'active_cases') in all countries.
Assumptions of a regression model:
X = full_data[['total_deaths','new_cases','new_deaths']]
# #### fit a OLS model with intercept on total_cases and new_cases,new_deaths.
y = full_data['total_cases']
X = sm.add_constant(X)
est = sm.OLS(y, X).fit()
print(est.summary())
OLS Regression Results
==============================================================================
Dep. Variable: total_cases R-squared: 0.963
Model: OLS Adj. R-squared: 0.962
Method: Least Squares F-statistic: 1822.
Date: Sun, 10 May 2020 Prob (F-statistic): 5.23e-150
Time: 02:17:33 Log-Likelihood: -2411.3
No. Observations: 214 AIC: 4831.
Df Residuals: 210 BIC: 4844.
Df Model: 3
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
const -249.7789 1346.307 -0.186 0.853 -2903.787 2404.229
total_deaths 5.7334 0.563 10.186 0.000 4.624 6.843
new_cases 19.1134 1.938 9.862 0.000 15.293 22.934
new_deaths 356.5852 58.874 6.057 0.000 240.526 472.644
==============================================================================
Omnibus: 167.889 Durbin-Watson: 1.552
Prob(Omnibus): 0.000 Jarque-Bera (JB): 10347.060
Skew: -2.313 Prob(JB): 0.00
Kurtosis: 36.749 Cond. No. 7.21e+03
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.21e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
*β₀ and β₁ are chosen to minimize the square of the distance between the predicted values and the actual values.
In order to understand trends we look at the slope of the death cases, new cases and total deaths in linearscale..
From our results, we see that • The intercept 𝛽̂0 = -111.27
The regression coefficient (coef) represents the change in the dependent variable resulting from a one unit change in the predictor variable, all other variables being held constant. In our model, a one unit increase in total_deaths, new_cases,new_deaths increase total_cases.
• The slope 𝛽̂1 = 5.0908
• The slope 𝛽̂2 = 11.7764 .
• The slope 𝛽̂3 = 321.7351
• The positive 𝛽̂3 parameter estimate implies high new_deaths rate In line with our assumptions, an increase in total_deaths, new_cases, new_deaths appears to increase the total cases.
The p-value means the probability of an 5.0908 increase in total_cases due to a one unit increase in total_deaths is 0%, assuming there is no relationship between the two variables.
** The standard error measures the accuracy of total_deaths coefficient by estimating the variation of the coefficient if the same test were run on a different sample . Our standard error 1, is low and therefore appears accurate.
fig = plt.figure(figsize =(15,8))
results = smf.ols('total_cases~total_deaths+new_cases+new_deaths',data = full_data).fit()
sm.graphics.plot_regress_exog(results, 'total_deaths', fig=fig)
plt.show()
2.The “Residuals versus total_deaths graph shows our model's errors versus the specified predictor variable. Each dot is an observed value; the line represents the mean of those observed values.
3.The “Partial regression plot” shows the relationship between total_cases and total_deaths,the impact of adding other independent variables on our existing total_deaths coefficient.
4.the Component and Component Plus Residual (CCPR) plot is an extension of the partial regression plot, but shows where our trend line would lie after adding the impact of adding our other independent variables on our existing total_deaths coefficient.This is the "component" part of the plot and is intended to show where the "fitted line" would lie.
X = full_data[['total_recovered','new_cases','active_cases']]
#### fit a OLS model with intercept on total_recovered and new_cases,active_cases.
y = full_data['total_cases']
X = sm.add_constant(X)
est = sm.OLS(y, X).fit()
print(est.summary())
OLS Regression Results
==============================================================================
Dep. Variable: total_cases R-squared: 0.999
Model: OLS Adj. R-squared: 0.999
Method: Least Squares F-statistic: 1.297e+05
Date: Sun, 10 May 2020 Prob (F-statistic): 0.00
Time: 02:17:35 Log-Likelihood: -1958.8
No. Observations: 214 AIC: 3926.
Df Residuals: 210 BIC: 3939.
Df Model: 3
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------
const 148.9922 164.783 0.904 0.367 -175.849 473.834
total_recovered 1.0992 0.009 128.820 0.000 1.082 1.116
new_cases -1.4876 0.259 -5.752 0.000 -1.997 -0.978
active_cases 1.0893 0.006 188.678 0.000 1.078 1.101
==============================================================================
Omnibus: 276.426 Durbin-Watson: 1.539
Prob(Omnibus): 0.000 Jarque-Bera (JB): 19360.057
Skew: 5.492 Prob(JB): 0.00
Kurtosis: 48.283 Cond. No. 7.94e+04
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.94e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
*β₀ and β₁ are chosen to minimize the square of the distance between the predicted values and the actual values.
In order to understand trends we look at the slope of the recoverd_cases, new cases and active_case in linearscale..
From our results, we see that • The intercept 𝛽̂0 = 3.0684 The regression coefficient (coef) represents the change in the dependent variable resulting from a one unit change in the predictor variable, all other variables being held constant. In our model, a one unit increase the recoverd_cases, new cases and active_case increase total_cases.
• The slope 𝛽̂1 = 1.0937.
• The slope 𝛽̂2 = 0.1470
• The slope 𝛽̂3 = 1.0588.
• The positive 𝛽̂2 parameter estimate implies low new_cases rate In line with our assumptions, an increase in the recoverd_cases,active_case appears to increase the total cases.
The p-value means the probability of an 1.1121 increase in total_cases due to a one unit increase in total_recovered is 0%, assuming there is no relationship between the two variables.
** The standard error measures the accuracy of total_deaths coefficient by estimating the variation of the coefficient if the same test were run on a different sample . Our standard error, 0.007, is low and therefore appears accurate.
fig = plt.figure(figsize =(15,8))
results = smf.ols('total_cases~total_recovered + new_cases+active_cases', data = full_data).fit()
#sm.graphics.plot_ccpr_grid(results, fig=fig)
sm.graphics.plot_regress_exog(results, 'total_recovered', fig=fig)
plt.show()
###endogenous: caused by factors within the system ,exogenous: caused by factors outside the system
2.The “Residuals versus total_deaths graph shows our model's errors versus the specified predictor variable. Each dot is an observed value; the line represents the mean of those observed values.
3.The “Partial regression plot” shows the relationship between total_cases and total_recovered,the impact of adding other independent variables on our existing total_recovered coefficient.
4.the Component and Component Plus Residual (CCPR) plot is an extension of the partial regression plot, but shows where our trend line would lie after adding the impact of adding our other independent variables on our existing total_recovered coefficient.This is the "component" part of the plot and is intended to show where the "fitted line" would lie.
fig = px.scatter(full_data, y='total_cases', x='active_cases',animation_frame="active_cases",text = "country",range_x =[0,100000],range_y=[0,100000])
fig.update_layout(
# height=800,width=1000,
title_text='Total Active_Cases in All countries',xaxis = dict(
tickangle = 90,
title_text = "Active_Cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Cases",
title_standoff = 10)
)
fig.show()
fig2 = px.scatter(full_data, y='total_cases', x='serious', animation_frame="serious",text ='country',range_x =[0,10000],range_y=[0,100000])
fig2.update_layout(
#height=800,width=1000,
title_text='Total Serious Cases in the world',xaxis = dict(
tickangle = 90,
title_text = "Serious_cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Cases",
title_standoff = 10)
)
fig2.show()
fig = px.scatter(full_data, x='tot\xa0cases/1m_pop', y='deaths/1m_pop', color='country',log_x=True ,log_y=True ,size_max=45)
fig.update_layout(
# height=800,width=1000,
title_text='Total Deaths Cases/ 1m_pop in the world',xaxis = dict(
tickangle = 90,
title_text = "Total_cases/1m_pop",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Deaths Cases/1m_pop",
title_standoff = 10)
)
fig.show()
fig2 = px.scatter(full_data, x='tot\xa0cases/1m_pop', y='tests/_1m_pop', color='country',log_x=True ,log_y=True ,size_max=45)
fig2.update_layout(
#height=800,width=1000,
title_text='Total Test/1m_pop in the world',xaxis = dict(
tickangle = 90,
title_text = "Total_cases/1m_pop",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Test/1m_pop",
title_standoff = 10)
)
fig2.show()
full_data.fillna(0, inplace=True)
full_data.head(10)
fig = px.scatter_mapbox(full_data, lat="lat", lon="long",color = 'country', hover_name="country", hover_data=['total_cases', "total_deaths"],
color_continuous_scale=px.colors.cyclical.IceFire,
animation_frame='total_cases',size_max=55, zoom=3)
fig.update_layout(title_text="Total_Cases in World")
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
fig = px.scatter_mapbox(full_data, lat="lat", lon="long",color = 'country', hover_name="country", hover_data=['total_cases',"total_deaths" ,"total_recovered"],
color_continuous_scale=px.colors.cyclical.IceFire,
animation_frame='total_deaths',zoom=3)
fig.update_layout(title_text="Total_Deaths in World")
fig.update_layout(
mapbox_style="white-bg",
mapbox_layers=[
{
"below": 'traces',
"sourcetype": "raster",
"source": [
"https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
]
}
])
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
full_data=pd.melt(full_data, id_vars=['country','tests/_1m_pop'], value_vars=['total_cases', 'new_cases', 'total_deaths', 'new_deaths', 'total_recovered'])
# plotly
fig = px.line(full_data, x='country', y='value', color='variable',log_x=False ,log_y=True)
fig.update_layout(
# height=800,width=1000,
title_text='Total Cases in the world',xaxis = dict(
tickangle = 90,
title_text = "Country",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Value",
title_standoff = 10)
)
# Show plot
fig.show()
base_color = sns.color_palette()[1]
plt.figure(figsize=(32,6))
g = sns.countplot(data = covid_india, x ='State/UnionTerritory', color = base_color)
g.set_xticklabels(g.get_xticklabels(), rotation=45)
g.set_title('Covid_19 Analysis Based on State/UnionTerriory')
n_points = covid_india.shape[0]
cat_counts = covid_india['State/UnionTerritory'].value_counts()
locs, labels = plt.xticks()
for loc, label in zip(locs, labels):
count = cat_counts[label.get_text()]
pct_string = '{:0.1f}%'.format(100*count/n_points)
plt.text(loc, count-8, pct_string, ha = 'left',va='bottom', color = 'black')
fig = px.scatter(covid_india,x="Confirmed",y="Deaths" ,animation_frame="Deaths", animation_group="State/UnionTerritory",color="State/UnionTerritory",log_x=True ,log_y=True , range_x=[1,10000],range_y=[1,10000])
fig.update_traces(textposition='top center')
fig.update_layout(
#height=800,width=1000,
title_text='Total Deaths Cases in India',xaxis = dict(
tickangle = 90,
title_text = "Total_cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total_Deaths Cases",
title_standoff = 10),
)
fig.show()
fig = px.scatter(covid_india,x="Confirmed",y="Cured", animation_frame="Cured", animation_group="State/UnionTerritory",color="State/UnionTerritory", log_x=True ,log_y=True ,size_max=45, range_x=[1,10000],range_y=[1,10000])
fig.update_traces(textposition='top center')
fig.update_layout(
#height=800,width=1000,
title_text='Total Recovered Cases in India',xaxis = dict(
tickangle = 90,
title_text = "Total_cases",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Total Recovered Cases",
title_standoff = 10)
)
fig.show()
-Statsmodels module is used for covid_19 cases that provides classes and functions for the estimation regression models,for conducting statistical tests, and statistical data exploration of covid_19('Confirmed cases','Deaths cases','Cured cases')in india.
X = covid_india[['Deaths']]
#### fit a OLS model with intercept on Deaths
y = covid_india['Confirmed']
X = sm.add_constant(X)
est = sm.OLS(y, X).fit()
print(est.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Confirmed R-squared: 0.890
Model: OLS Adj. R-squared: 0.890
Method: Least Squares F-statistic: 1.354e+04
Date: Sun, 10 May 2020 Prob (F-statistic): 0.00
Time: 02:20:31 Log-Likelihood: -12356.
No. Observations: 1671 AIC: 2.472e+04
Df Residuals: 1669 BIC: 2.473e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 112.9189 9.963 11.334 0.000 93.377 132.461
Deaths 22.6790 0.195 116.342 0.000 22.297 23.061
==============================================================================
Omnibus: 1165.847 Durbin-Watson: 1.700
Prob(Omnibus): 0.000 Jarque-Bera (JB): 30900.445
Skew: 2.903 Prob(JB): 0.00
Kurtosis: 23.251 Cond. No. 52.9
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
*β₀ and β₁ are chosen to minimize the square of the distance between the predicted values and the actual values.
In order to understand trends we look at the slope of the death cases in linearscale..
From our results, we see that • The intercept 𝛽̂0 = 113 The regression coefficient (coef) represents the change in the dependent variable resulting from a one unit change in the predictor variable, all other variables being held constant. In our model, a one unit increase in deaths, new_cases,increase Confirmed _cases. • The slope 𝛽̂1 = 22.66 In line with our assumptions, an increase in deaths appears to increase the confirmed cases.
The p-value means the probability of an 22.66 increase in Confirmed_cases due to a one unit increase in deaths is 0%, assuming there is no relationship between the two variables.
** The standard error measures the accuracy of deaths coefficient by estimating the variation of the coefficient if the same test were run on a different sample . Our standard error, 0.195, is low and therefore appears accurate.
fig = plt.figure(figsize =(15,8))
#full_data1= sm.dataset.full_data.load_pandas()
results = smf.ols('Confirmed ~Deaths ', data = covid_india).fit()
#sm.graphics.plot_ccpr_grid(results, fig=fig)
sm.graphics.plot_regress_exog(results, 'Deaths', fig=fig)
plt.show()
2.The “Residuals versus total_deaths graph shows our model's errors versus the specified predictor variable. Each dot is an observed value; the line represents the mean of those observed values.
3.The “Partial regression plot” shows the relationship between Confirmed_cases and deaths,the impact of adding other independent variables on our existing deaths coefficient.
4.the Component and Component Plus Residual (CCPR) plot is an extension of the partial regression plot, but shows where our trend line would lie after adding the impact of adding our other independent variables on our existing deaths coefficient.This is the "component" part of the plot and is intended to show where the "fitted line" would lie.
X = covid_india[['Cured']]
#### fit a OLS model with intercept on Cured Cases
y = covid_india['Confirmed']
X = sm.add_constant(X)
est = sm.OLS(y, X).fit()
print(est.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Confirmed R-squared: 0.793
Model: OLS Adj. R-squared: 0.793
Method: Least Squares F-statistic: 6388.
Date: Sun, 10 May 2020 Prob (F-statistic): 0.00
Time: 02:20:33 Log-Likelihood: -12886.
No. Observations: 1671 AIC: 2.578e+04
Df Residuals: 1669 BIC: 2.579e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 42.6187 14.004 3.043 0.002 15.151 70.086
Cured 4.2145 0.053 79.925 0.000 4.111 4.318
==============================================================================
Omnibus: 645.833 Durbin-Watson: 1.660
Prob(Omnibus): 0.000 Jarque-Bera (JB): 25483.266
Skew: 1.103 Prob(JB): 0.00
Kurtosis: 22.004 Cond. No. 281.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
*β₀ and β₁ are chosen to minimize the square of the distance between the predicted values and the actual values.
In order to understand trends we look at the slope of the Cured Cases in linearscale..
From our results, we see that • The intercept 𝛽̂0 = 42.61 The regression coefficient (coef) represents the change in the dependent variable resulting from a one unit change in the predictor variable, all other variables being held constant. In our model, a one unit increase in Cured increase Confirmed. • The slope 𝛽̂1 = 4.21 In line with our assumptions, an increase in cured cases appears to increase the Confirmed cases.
The p-value means the probability of an 4.21 increase in Confirmed cases due to a one unit increase in Cured Cases is 0%, assuming there is no relationship between the two variables.
** The standard error measures the accuracy of total_deaths coefficient by estimating the variation of the coefficient if the same test were run on a different sample . Our standard error, 0.050, is low and therefore appears accurate.
fig = plt.figure(figsize =(15,8))
#full_data1= sm.dataset.full_data.load_pandas()
results = smf.ols('Confirmed ~Cured', data = covid_india).fit()
#sm.graphics.plot_ccpr_grid(results, fig=fig)
sm.graphics.plot_regress_exog(results, 'Cured', fig=fig)
plt.show()
2.The “Residuals versus cured graph shows our model's errors versus the specified predictor variable. Each dot is an observed value; the line represents the mean of those observed values.
3.The “Partial regression plot” shows the relationship between Cured and confirmed cases ,the impact of adding other independent variables on our existing cured coefficient.
4.the Component and Component Plus Residual (CCPR) plot is an extension of the partial regression plot, but shows where our trend line would lie after adding the impact of adding our other independent variables on our existing cured coefficient.This is the "component" part of the plot and is intended to show where the "fitted line" would lie.
f,g = plt.subplots(figsize = (15,10))
base_color = sns.color_palette()[0]
g = sns.barplot(data = covid_india, x = 'Confirmed', y = 'State/UnionTerritory',
label = 'Total Confirmed Cases',color = base_color)
sns.set_color_codes('muted')
g = sns.barplot(x = 'Cured', y = 'State/UnionTerritory', data = covid_india,
label = 'Total number of Cured', color = 'R', edgecolor = 'w')
sns.set_color_codes('pastel')
g= sns.barplot(x = 'Deaths', y = 'State/UnionTerritory', data = covid_india,
label = 'Total number of Deaths', color = 'g', edgecolor = 'w')
g.set_title('Analysis of Covid_19 Cases in India')
g.legend(ncol = 3, loc = 'lower right')
#sns.despine(left = True, bottom = True)
plt.show()
C:\Users\cody\Anaconda2\envs\mypython3\lib\site-packages\seaborn\utils.py:123: MatplotlibDeprecationWarning: Support for uppercase single-letter colors is deprecated since Matplotlib 3.1 and will be removed in 3.3; please use lowercase instead.
plt.figure(2, figsize=(20,15))
fig,ax = plt.subplots(1, 2)
g =sns.scatterplot(data=covid_india,y="Deaths",x="Confirmed",ax=ax[0],hue="State/UnionTerritory",palette="deep")
g.legend(loc='upper right', bbox_to_anchor=(.20, 0.0), ncol=1)
g =sns.scatterplot(data=covid_india,y="Cured",x="Confirmed",ax=ax[1],hue="State/UnionTerritory",palette="deep")
g.set_title('Covid_19 Deaths Cases and Recovered Cases in India and its State/ Union Territory')
g.legend(loc='upper left', bbox_to_anchor=(1.0, 0.0), ncol=1)
<matplotlib.legend.Legend at 0x12ef2cd70f0>
<Figure size 1440x1080 with 0 Axes>
fig = px.bar(covid_india_age, y='TotalCases', x='AgeGroup', text='Percentage')
fig.update_traces(texttemplate='%{text}', textposition='outside',marker_color='lightsalmon')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.update_layout(title_text="Anaysis Based on Age Group in India")
fig.show()
fig = px.scatter(covid_india_state_testing, x='Date', y='Positive', title='Positive Cases Time Series with Rangeslider', range_x=['2020-01-23','2020-05-04'],range_y = [1,100000])
fig.update_traces(marker_color='indianred')
fig.update_xaxes(rangeslider_visible=True)
fig.update_layout(
# height=800,width=1000,
title_text='Covid_19 Testing in India',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Positive",
title_standoff = 10),
)
fig.show()
fig1 = px.scatter(covid_india_state_testing, x='Date', y='Negative', title='Negative Cases Time Series with Rangeslider',range_x = ['2020-01-23','2020-05-04'],range_y = [1,1000000])
fig1.update_traces(marker_color='lightsalmon')
fig1.update_xaxes(rangeslider_visible=True)
fig1.update_layout(
# height=800,width=1000,
title_text='Covid_19 Testing in India',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Negative",
title_standoff = 10),
)
fig1.show()
Date:22/jan/2020 till 6/may/2020
fig = px.scatter(covid_time_series_C, x='Date', y='Confirmed',color="Country/Region", title='Confirmed Cases Time Series with Rangeslider',range_x=['1/22/20','4/29/20'],range_y = [1,1500000])
#fig.update_traces(marker_color='darkslateblue')
fig.update_layout(
# height=800,width=1000,
title_text= 'Datewise Confirmed Cases in all Countries',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Confirmed Cases",
title_standoff = 10),
)
fig.update_xaxes(rangeslider_visible=True)
fig.show()
fig = px.scatter(covid_time_series_D, x='Date', y='Deaths',color="Country/Region", title='Deaths Cases Time Series with Rangeslider',range_x=['1/22/20','4/29/20'],range_y = [1,100000])
#fig.update_traces(marker_color='sandybrown')
fig.update_layout(
# height=800,width=1000,
title_text= 'Datewise Deaths Cases in all Countries',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = " Deaths Cases",
title_standoff = 10),
)
fig.update_xaxes(rangeslider_visible=True)
fig.show()
fig = px.scatter(covid_time_series_covid_19_R, x='Date', y='Recovered',color="Country/Region", title='Recovered Cases Time Series with Rangeslider',range_x=['1/22/20','4/29/20'],range_y = [1,1000000])
#fig.update_traces(marker_color='green')
fig.update_layout(
#height=800,width=1000,
title_text= 'Datewise Recovered Cases in all Countries',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Recovered Cases",
title_standoff = 10),
)
fig.update_xaxes(rangeslider_visible=True)
fig.show()
Date:22/jan/2020 till 6/may/2020
fig1 = px.scatter(covid_time_series_I, x='Date', y='Confirmed',color="Country/Region", title='Recovered Cases Time Series with Rangeslider',range_x=['1/22/20','4/29/20'],range_y = [1,10000])
fig1.update_layout(
# height=800,width=1000,
title_text= 'Datewise Confirmed Cases in India',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Confirmed Cases",
title_standoff = 10),
)
fig1.update_xaxes(rangeslider_visible=True)
fig1.show()
fig2 = px.scatter(covid_time_series_I, x='Date', y='Deaths',color="Country/Region",range_x=['1/22/20','4/29/20'],range_y = [1,10000])
fig2.update_layout(
#height=800,width=1000,
title_text= 'Datewise Deaths Cases in India ',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Deaths Cases",
title_standoff = 10),
)
fig2.update_xaxes(rangeslider_visible=True)
fig2.show()
fig3 = px.scatter(covid_time_series_I, x='Date', y='Recovered',color="Country/Region", title='Recovered Cases Time Series with Rangeslider',range_x=['1/22/20','4/29/20'],range_y = [1,10000])
fig3.update_layout(
#height=800,width=1000,
title_text= 'Datewise Recovered Cases in India ',xaxis = dict(
tickangle = 90,
title_text = "Date",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Recovered Cases",
title_standoff = 10),
)
fig3.update_xaxes(rangeslider_visible=True)
fig3.show()
Countries had the highest number of confirmed cases , Deaths Cases and recovered cases in one particular day.
fig = px.scatter(covid_time_series, x='Day', y='Confirmed',color="Country/Region",log_x=False ,log_y=True )
fig.update_layout(
#height=800,width=1000,
title_text= 'Daywise Confirmed Cases in All Countries ',xaxis = dict(
tickangle = 90,
title_text = "Day",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Confirmed Cases",
title_standoff = 10),
)
fig.show()
fig1 = px.scatter(covid_time_series, x='Day', y='Deaths',color="Country/Region",log_x=False ,log_y=True )
fig1.update_layout(
#height=800,width=1000,
title_text= 'Daywise Deaths Cases in All Countries ',xaxis = dict(
tickangle = 90,
title_text = "Day",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Deaths Cases",
title_standoff = 10),
)
fig1.show()
fig2 = px.scatter(covid_time_series, x='Day', y='Recovered',color="Country/Region",log_x=False ,log_y=True )
fig2.update_layout(
#height=800,width=1000,
title_text= 'Daywise Recovered Cases in All Countries ',xaxis = dict(
tickangle = 90,
title_text = "Day",
title_font = {"size": 15},
title_standoff = 10),
yaxis = dict(
title_text = "Recovered Cases",
title_standoff = 10),
)
fig2.show()
-Analyze curated data related to the Covid_19 Viral Disease to understand the magnitude of the risk posed by this infectious Disease in all Countries:
- First cases of covid_19 is reported in Wuhan (the city where the virus originated)Central China, with a population of over 11 million people.The city, on January 23.
- 2-14 days represents the current official estimated range for the novel coronavirus COVID-19.
- On January Month , the novel coronavirus cases in the UK,Russia,Sweden, Spain were reported less in number.
- On March and April Months there is tremendous increase of Confirmed cases and Deaths Cases in all over World.
** Total Confirmed Case in World =3,913,644
** Total Deaths Case in World = 270,426
** Total Recovered Case in World=1,341,022
** Total Confirmed Case in India = 56,342
** Total Deaths Case in India = 16,540
** Total Recovered Case in India =1,886
(* current data reported in conclusion)
Age and conditions of Coronavirus cases in india reported TotalSamples :1.57851e+07 Negative 1.30916e+07 Positive 571595
Time series data show that their increase in confirmed cases rapidly .and also the death rate increase in countries .
def hide_code_in_slideshow():
from IPython import display
import binascii
import os
uid = binascii.hexlify(os.urandom(8)).decode()
html = """<div id="%s"></div>
<script type="text/javascript">
$(function(){
var p = $("#%s");
if (p.length==0) return;
while (!p.hasClass("cell")) {
p=p.parent();
if (p.prop("tagName") =="body") return;
}
var cell = p;
cell.find(".input").addClass("hide-in-slideshow")
});
</script>""" % (uid, uid)
display.display_html(html, raw=True)
hide_code_in_slideshow()